Extracting Invertible Translations from pre aligned Texts
نویسنده
چکیده
This paper presents an approach to extract invert ible translations from pre aligned bilingual texts The extracted set of invertible translations is unam biuous because each string occurs only once in either language side Two variants of the algorithms are presented using di erent knowledge resources The knowledge rich variant of the algorithm makes use of a bilingual lexicon in addition to a morphological analyser and a shallow syntax formalism which are similarly used in the knowledge poor algorithm It is shown that the knowledge rich method yields better results than the knowledge poor method
منابع مشابه
Inducing probabilistic invertible translation grammars from aligned texts
This paper presents an algorithm for extracting invertible proba-bilistic translation grammars from bilingual aligned and linguistically bracketed text. The invertibility condition requires all translation ambiguities to be resolved in the-nal translation grammar. The paper examines the complexity of inducing translation grammars and proposes a number of heuristics to reduce the the theoretical...
متن کاملParaConc: Concordance Software for Multilingual Parallel Corpora
Parallel concordance software provides a general purpose tool that permits a wide range of investigations of translated texts, from the analysis of bilingual terminology and phraseology to the study of alternative translations of a single text. This paper outlines the main features of a Windows concordancer, ParaConc, focussing on alignment of parallel (translated) texts, general search procedu...
متن کاملAn Approach to Acquire Word Translations from Non-parallel Texts
Few approaches to extract word translations from non-parallel texts have been proposed so far. Researchers have not been encouraged to work on this topic because extracting information from non-parallel corpora is a difficult task producing poor results. Whereas for parallel texts, word translation extraction can reach about 99%, the accuracy for non-parallel texts has been around 72% up to now...
متن کاملTranslation as Annotation
In this paper we illustrate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the key notion that translating a text can be seen as a linguistic annotation task which is easier than manual annotation with formal schemes. After translation, formal annotations can be automatically derived...
متن کاملMining Parenthetical Translations for Polish-English Lexica
Documents written in languages other than English sometimes include parenthetical English translations, usually for technical and scienti c terminology. Techniques had been developed for extracting such translations (as well as transliterations) from large Chinese text corpora. This paper presents methods for mining parenthetical translation in Polish texts. The main di erence between translati...
متن کامل